![]() Process for automatically extracting information of a predefined type from a document
专利摘要:
A method and system are provided for automatically extracting information of a predefined type from a document. The method includes using an object detection algorithm to identify at least one segment of the document that likely includes information of the predefined type. The method further comprises constructing at least one limit box corresponding to said at least one segment and, if the limit box likely includes the predefined type information, extracting the information included by the limit box of said at least one limitation box. Figure for abstract: Fig. 1 公开号:FR3098328A1 申请号:FR1907252 申请日:2019-07-01 公开日:2021-01-08 发明作者:Sebastian Andreas Bildner;Paul Krion;Thomas Stark;Martin Christopher Stämmler;Martin Von Schledorn;Jürgen Oesterle;Renjith Karimattathil Sasidharan 申请人:Amadeus SAS; IPC主号:
专利说明:
[0001] The present invention generally relates to the extraction of information, in particular the extraction of information of a predefined type from documents. Information extracted from a predefined type is, for example, information from a data sheet that relates to technical items (semiconductors, etc.), receipt information (total amount, etc.) and other such information. [0002] YOLO9000: Better, Faster, Stronger by Joseph Redmon, Ali Farhadi, University of Washington, Allen Institute for Artificial Intelligence (arXiv: 1612.08242v1 [cs.CV] 25 December 2016), deals with real-time object detection, ability to detect various categories of objects and applies to images. [0003] Data entry corresponding to information of a predefined type is often done manually and is currently aided by conventional optical character recognition (OCR). [0004] According to a first aspect, a method is provided for automatically extracting information of a predefined type from a document. The method includes using an object detection algorithm to identify at least one segment of the document that likely includes information of the predefined type, hereinafter referred to as "segment of interest". The method further includes constructing at least one bounding box corresponding to said at least one segment and, in response to identifying a bounding box presumably including the predefined type information, extracting the information included in the limitation box of said at least one limitation box. [0005] According to a second aspect, a computer system, comprising at least one computer arranged to execute the method according to the first aspect, is provided. [0006] According to a third aspect, a computer program product comprising program code instructions stored on a computer readable medium is provided in order to implement the steps of the method according to the first aspect when said program runs on a computer. [0007] A method is provided for automatically extracting information of a predefined type from a document. As mentioned above, information of a predefined type is, for example, information to be retrieved from documentation or from a technical data sheet of a prefabricated part, from a receipt, or from other such information. This information could be the radius of a wheel, a short-circuit current when the information must be extracted from a documentation or from a technical data sheet of a prefabricated part. The information could be numbers such as a receipt number, VAT ID, or total amount of a receipt when the information is to be extracted from a receipt, or other such information. [0008] The method includes using an object detection algorithm to identify at least one segment of interest. Therefore, a specialized OCR solution is, for example, applied to target significant areas of the document. Identifying and extracting important objects on receipts, such as total amounts and creation dates, can be accomplished in an essentially language-neutral way, for example, by exploiting the visual structure of a document in this way. When the visual structure of the document is exploited, the process could, for example, also be used to avoid incomplete text fragments and low resolution images, since only information of a predefined type is targeted and the rest of the document may be exempt from character recognition. Recognition occurs, for example, on low resolution images, reducing processing time and memory requirements and the method is potentially applicable to mobile devices. [0009] The automatic information extraction method involves constructing at least one limitation box corresponding to said at least one segment and in response to identifying that a limitation box likely includes the predefined type information, extracting the information included by the limitation box from said at least one limitation box. [0010] As an example, an object detection algorithm based on convolutional neural networks is used to detect information of a predefined type for example on a scanned image from the document. This produces bounding boxes around candidates for locating predefined type information on the document. This can also provide each candidate with a probability of matching the predefined type information to be extracted. Therefore, in some examples, a probability value is assigned to a limiting box, the probability value being indicative of the probability that a certain limiting box contains the predefined type information. [0011] Furthermore, an object detection algorithm based on convolutional neural networks is used for example for optical character recognition. The result of this processing step includes a bounding box for each detected character and probabilities for character classification. The first and the second object detection algorithm are used for example to extract values of detected objects, as well as confidence values. [0012] In some examples, a character identification algorithm is used to extract the predefined type information in said at least one limiting box. In some examples, the character identification algorithm used to extract the predefined type information is configured to use characteristics of said predefined type information to recognize the predefined type information. [0013] Higher character recognition accuracy can be achieved by using a specialized bespoke OCR solution. Therefore, the OCR solution could be configured to be applied to single-page documents, multiple numbers on a single page, small continuous text, low image quality, and limited character set. As an example, fully convolutional networks designed for object detection are used for OCR. Object detection algorithms based on convolutional networks can handle certain document image degradations (reduced contrast, low resolution, etc.) better than traditional OCR solutions by getting rid of binarization and character segmentation steps . They are also faster since they combine character classification and localization steps into a single evaluation running on the graphics processing unit (GPU). [0014] In some examples, the used characteristics of said information of a predefined type include at least one of a comma or decimal point position and a number format. [0015] In some examples, the neural network is a multi-layered neural network in which each different layer serves to identify different features of the document. [0016] In some examples, the method includes a training activity to train the neural network with a plurality of documents to correctly extract the predefined type information. [0017] Desired information of a predefined type used for training is, for example, a total amount on a receipt. However, by way of example, any other data such as numbers of a specific format on a technical data sheet may be used. In order to train the network to recognize this desired information, a training set is generated by applying OCR to document images in order to recognize the text on these images as well as the bounding boxes for each character. [0018] For example, to produce ground truth data, regular expressions and other rule-based grammars and approaches are used to find occurrences inside OCR text. This is used, for example, for amounts, VAT IDs or other data elements with a very strict and characteristic syntax. [0019] Depending on the type of data to be recognized, feedback data containing information of a predefined type may be used. Feedback data may be obtained from users using the method and compare the extracted predefined type information with currently present predefined type information. This feedback data is reliable since the user must confirm the accuracy of the submitted data. [0020] For example, the user provides an arrival date for a hotel bill. This value is normalized so that various type-dependent notations for this value ("2019-01-18", "18. Jan 2019", etc.) can be generated, which the method then tries to identify in the text OCR. [0021] From the matches found, for example, per-character limitation boxes are extracted which are used as ground truth for training the two object detection phases (the location phase for the limitation box and the detection phase of characters). If no occurrence of a date of a particular type can be found in the OCR text, the document is not considered for training on that data type. [0022] Predefined type information positioning detection phase: An object detection network trained to detect the positioning of desired elements on the document image is applied. The result is a set of limitation boxes describing the positions of interest on the document with, in some examples, a type of information detected (eg amount, date, etc.). [0023] The bounding box describes the position and size of the information. It is used to resize the image to the dimensions of the bounding box, resulting in a fairly small image presumably containing only the information image of the predefined type. Then another convolutional network trained for character recognition is applied (see below). [0024] Character detection phases: This exemplary approach provides the effect that OCR can be customized for specific use cases since the expected set of characters/words corresponding to the predefined type information is known. It is also faster than applying custom OCR to the entire document, especially on mobile devices with limited computational capacity, in instances involving implementation of the process on mobile devices. [0025] For example, for the character identification activity, a different object detection algorithm is used than that used for the identification of the segment of interest(s). [0026] In some examples, a convolutional neural network, in particular a fully convolutional neural network, is used by the object detection algorithm and/or the character identification algorithm. [0027] In one example, a first custom convolutional neural network designed to identify a segment(s) of interest is applied and a second custom convolutional neural network designed for character identification in previously identified segments is used to extract information. [0028] Therefore, character identification is, for example, based on a fully convolutional network. The result, again as an example, includes bounding boxes with classifications and probabilities, similar to the localization step described above. Each bounding box can describe the positioning of one character. The classification can indicate which character is present in the limitation box. [0029] In the following activity for example, all result limiting boxes are collected. If two boxes intersect too much, only the box with the highest probability can be selected. Limiting boxes are ordered, for example, by their horizontal position on the resized image, corresponding to the segment of the document that presumably contains the predefined type information that is understood by the currently examined limiting box. For example, it is assumed that all characters are located in the same line of text. For each bounding box B, a subset of the character set consisting of all characters that may occur in that position, can be determined. The subset is for example determined by syntax constraints and by other constraints such as a constraint relating to the format and/or the formation of characters (valid calendar date, etc.). [0030] In some examples, a probability value is assigned to a character identified by a character identification algorithm, the probability value being indicative of the probability that the identified character is identical to a character currently understood by the predefined type information . From the aforementioned determined subset of characters, the character with the highest probability, as determined by the object detection algorithm, can be chosen. [0031] In some examples, a probability value assigned to a bounding box and probability values assigned to characters within that bounding box are used to provide a combined confidence score. For example, if the result of the probabilities of all characters detected in a bounding box B in a token is greater than a threshold, that sequence of characters is accepted as a match. The consequence is that longer matches may have a lower probability, which suits some applications, because longer matches have a higher risk of containing a mismatch and even one mismatched character renders the match as a whole useless. . This is for example the case for fields of application such as extracting numbers from a data sheet and extracting a total amount from receipts. [0032] Return of results to the user The value extracted in this way is then presented, for example, to the user for confirmation that the value extracted corresponds to the information of a predefined type currently present on the document. The extracted values are for example presented to the user on the screen of a mobile device, a tablet or other device of this kind. The extracted value is for example used to prefill a form in which the user must enter the extracted value among other data, the entire form - as well as the extracted information entered - could be stored on the mobile device for later evaluation Datas. As an example use case, the autoform can encompass manually configured values and the extracted value read from the datasheet. In another exemplary use case, the value retrieved may be a total amount from a receipt and the user may enter any other relevant expense data into the form. If the user has determined the recognized value to be erroneous, he may change it at his discretion. Eventually, the user may need to confirm that all values in the form are correct. [0033] The data collected in the form is then uploaded, for example, to the back-end system. Subsequently, an auditor can verify the uploaded data and compare it to the document from which the predefined type information was extracted. This manually validated data can serve as a kind of feedback data used to train the neural network involved in object and character detection on an ongoing basis. [0034] The current recognition stage of the process could work in the back-end system or in the document scanning device itself, e.g. ex. a mobile phone with a camera or other such device. In the event that recognition is performed in the back-end system, the photo document will be uploaded for recognition directly after the photo is taken. In case the recognition is performed in the document scanning device itself, the extracted information could be permanently stored on the document scanning device for later evaluation of the extracted data. Other use cases may also be used, such as a use case in which the user uploads the document to the system via a website. The same workflow can be used here: The fields of the form, which the user must fill in, are for example pre-filled with said one or more extractions. [0035] Structure and function of the neural network involved: In some examples, a first layer of the neural network is assigned to differentiating between empty and non-empty areas of a document, and it is further used to identify basic modes present in the document, and a second layer of the neural network is used to identify shapes that are more complex in comparison to the aforementioned basic modes present in the document. [0036] The exact architecture of the network is not of paramount importance, as it could be replaced by different architectures as long as the definition of the final "decision layer" (see description below) remains the same. [0037] Thus, in some examples, the neural network is a neural network compatible with a decision layer, the decision layer being a neural network layer that is used to detect at least one of (i) a positioning of a limit box ( ii) the height and width of a bounding box, and (iii) a classification score indicating a classification of a detected character. For example, a YOLOv2 model, formed from scratch, can be used. [0038] The final decision layer of the network is, for example, a convolutional layer. This layer can consist of filters with width=1 and height=1. Nevertheless, the convolutional layer could actually work as a regression layer. These kernels are, for example, arranged in a grid of a certain size (for fully convolutional networks, this size depends on the dimensions of the captured image) effectively dividing the image along this grid. The 1 x 1 kernel in each of these grid cells can contain one detection for each anchor box. Each detection contains for example the following information: - the exact position of the center of the limitation box - the height and width of the limiting box - a so-called "object presence score" (which is a score that determines whether or not a limit box contains an object) - classification scores. [0039] The depth of each 1 x 1 kernel is for example B*(5+C), where B is the number of candidates that the model predicts for each cell (which can be chosen freely; the default being 5). [0040] The predicted values for each candidate are for example the aforementioned “object presence score” corresponding for example to the intersection of the sets between the ground reality and the candidate's limitation box prediction (the intersection of the sets is a metric for evaluating the limiting boxes, the ground truth corresponding to the current limiting boxes to be detected), four values ( t x , t y , t w , th h ) indicating the positioning and shape of the limiting box limitation of the candidate. The number of predicted values for each candidate is for example five. C values are reserved for conditional probabilities of object classification (corresponding to classification scores) given that there is an object. In this example, it is the number of classes that the model is able to differentiate. These classes can correspond to alphanumeric, Arabic, Chinese or Japanese characters that the model must differentiate. For example, there is a 90% probability that the value to be extracted matches the character “B” and a 10% probability that the value to be extracted matches the character “8”. [0041] The horizontal positioning of the predicted limiting box is, for example, calculated asb x= σ( you x )+vs x, oryou x is the predicted value andvs xis the horizontal deviation of the cell (kernel) in the two-dimensional grid. The vertical positionb thereis, for example, defined in the same way. In this example, if the predicted valuesyou x andyou there are 0, the center of the predicted bounding box is located exactly at the center of the grid. In this example, the cell containing the center point of the bounding box is therefore responsible for detecting the object. [0042] et p w etp hsont respectivement la hauteur et la largeur précédente de la boîte de limitation, alors quet w ett h sont les valeurs de prédiction. Dans cet exemple, si les valeurs préditest w ett h sont 0, la boîte de limitation prédite est exactement égale au précédent.The width b w and the height b h of the predicted limiting box are for example defined as andp w and p h are the previous height and width of the bounding box, respectively, while t w and th h are the prediction values. In this example, if the predicted values t w and th h are 0, the predicted limiting box is exactly equal to the previous one. [0043] The exemplary definition implies that in this example the prior/anchor B -boxes are defined, one for each prediction per cell. In order to choose these prerequisites, for example the ground truth limitation boxes are grouped into groupings B according to their width and their height. For example, for each of these clusters a pair p h and p w is chosen, so that the average values t w and th h for generating the ground truth boxes in the cluster are minimized. [0044] Convolutional Neural Network (CNN) Layer: A layer in a two-dimensional convolutional network includes, for example, multiple cores. By applying the kernel to an image, the kernel moves for example through the image, which is also represented, for example, as a two-dimensional matrix of a much larger size. For example, the kernel is applied subsequently to each patch until the entire image is covered. The image may correspond to a photo or to a scan of a document from which the information of the predefined type must be extracted. [0045] The application consists of multiplying each value in the kernel by the corresponding kernel in the image and then adding all the results. Example : [0046] In the example above, the kernel is F and the image patch is I. In the middle of the image patch, there is a vertical line, indicated by the higher values (assuming the lower values). higher means darker colors). The kernel is defined in such a way that it detects left borders. Applying the kernel gives (-1*0.3)+(-1*0.1)+(-1*0.2)+(1*0.8)+(1*0.9)+( 1*0.8)=1.9. If the filter is moved one position to the right, the result is (-1*0.8)+(-1*0.9)+(-1*0.8)+(1*0.1) +(1*0.2)+(1*0.1)=-2.1, which is much lower. Once the kernel has been applied to the entire image, the values produced construct an intensity map highlighting the positions with vertical lines. This result of the convolution is then subjected to an activation function. Since a layer consists of multiple cores, rather than one, the result of each layer is a stack of these activation cards. If the next layer is another convolutional layer, it is applied to the activation map as before, but the kernels have a depth that is equal to the stack size of their input. [0047] Layers of max pooling (maximum pooling): Some convolutional neural networks that can be used in the process, such as YOLOv2 for example, use a max pooling technique, in order to reduce the dimensions of an activation map. When applying max pooling, a nucleus of predefined size moves through the output image, passing through the highest values and eliminating the others. Another method to reduce the dimensions of such an intensity map is, for example, to increase the step size of the core applications of the convolutional neural network. [0048] In the following example, I is the activation map of a previous convolutional layer. We apply a max pooling layer of size 2 x 2 and a cadence size of two. The different fonts (underline, italic, bold and standard) indicate the section of the I matrix affected by the max pooling layer and the max pooling result is indicated in the M matrix. [0049] [0050] From each of the four sub-arrays (indicated by the dotted lines surrounding them above), the element with the highest value is selected. The resulting layer M of the max pooling layer then looks like this: [0051] [0052] After the max pooling layer has reduced the data to be processed, another kernel filter can be used on the enablement map with reduced dimensions corresponding to a reduced number of entries. The max pooling layers can be used before the segment of interest identification decision layer and/or before the current character recognition activity decision layer. The decision layer could then be the layer that provides the final result to be entered in the aforementioned form for example. [0053] Examples of the invention will now be described also with reference to the accompanying drawings, in which [0054] is an exemplary document from which total amount information should be extracted, along with limitation boxes around candidates for this information, [0055] shows a schematic sequential flow of the information extraction process, [0056] shows two successive multilayer neural networks with multiple different kernels used to extract information, both multilayer neural networks involving a decision layer, [0057] is a schematic view of an exemplary decision layer of FIG. 3 as well as possible outcomes of this decision layer, [0058] is a schematic sequential flow of an example of the method including neural networks trained as illustrated in FIG. 3, [0059] is a schematic view on a mobile device which is used to implement the information extraction method, [0060] is a schematic computer on which the method of extracting information from a document can be implemented. [0061] The drawings and the description of the drawings are examples of the invention and do not represent the invention itself. Like reference signs refer to like elements throughout the following description of the examples. [0062] An exemplary document 1 from which total amount information is to be extracted, together with limitation boxes around the candidates for this information, is shown in FIG. 1. The exemplary document 1 shown in FIG. 1 and a receipt from a parking garage, hotel or other such entity. Exemplary document 1 comprises, p. eg, Chinese characters indicating the purpose of the receipt and several numbers with different purposes (eg a date, serial number, total amount, etc.). The digitized numbers of the document shown in Fig. 1 are confined in bounding boxes identified, for example, by an object identification algorithm based on a fully convolutional neural network. [0063] The object detection algorithm could for example recognize that, based on the format of the number and the number of characters in a string, the content of the 100 limit boxes probably does not match the total amount which is the information d 'a predefined type sought in this example. This could be accomplished, for example, by assigning a probability to match a total amount to each limitation box identified on the scanned document and comparing that probability to a threshold. [0064] However, the object detection algorithm could also recognize that based on the format of the number and the number of digits in a number, the contents of the 200 limit boxes could indeed be a total amount. This could also be accomplished by identifying probability values associated with limiting boxes and comparing the probabilities to a threshold. [0065] A schematic sequential flow of an example of the information extraction process is shown in Fig. 2 . During activity S1, an object detection algorithm is used to identify a segment(s) of interest. The predefined type information could be the total amount value of the receipted document 1 shown in FIG. 1. Predefined type information could be technical specification data on a data sheet or other such data. [0066] During activity S2, at least one limiting box corresponding to said at least one segment is constructed. The bounding box, for example, surrounds each segment that presumably includes the predefined type information. In the exemplary document shown in FIG. 1, these segments are segments which include numbers of all kinds and are not language characters, such as the Chinese characters in the document of FIG. 1. [0067] In an activity S3, the predefined type information (here: the total amount) is extracted from said at least one limitation box by a character identification algorithm configured to use the characteristics of said predefined type information (information concerning commas, decimal points or position of punctuation marks, number formats, SI units, etc.) in order to recognize information. For example, to recognize the information corresponding to the total amount on the document shown in Fig. 1, for example a format with a limited number of digits (e.g. three digits) and no more than a single punctuation, is used as a criterion. [0068] Two successive multilayer neural networks with multiple different cores used for information extraction involving object identification are illustrated by way of example in FIG. 3 . Both multilayer neural networks involve many cores and a decision layer. [0069] In an input layer 21 of the object detection neural network, having a depth of three, corresponding to the three RGB channels, the document 1 is converted into a set of values of three channel depths, with channels of dimensions 52 x 20 x 3. [0070] The next two successive layers 22, 23 are convolutional layers both having dimensions 20 x 52 x 64. In the first convolutional layer 22, a filter core having dimensions 3 x 3 x 64 is applied while in the second convolutional layer 23, a filter core having dimensions, 2 x 2 x 64, is applied. The two successive layers could be used for the detection of rudimentary shapes on the image, or for another use of this kind. [0071] The next layer 24 is a first max pooling layer that reduces the width and height of the value set to the dimensions 26 x 10. This reduction can, for example, serve to center only on the regions of the value set aforesaid which correspond to parts of images which are not empty. A filter having as dimensions, 3 x 3 x 64, serving p. ex. to abstract object detection, is also applied. [0072] As a next activity, again two successive convolutional layers, 25, 26 are applied to the whole with reduced dimensions. The depth of the corresponding filter and the set of values are increased to 128. These layers could be used for more accurate recognition of shapes and regions. [0073] Layer 27 is a second max pooling layer that reduces the width and height to 13 x 5 and additionally applies another filter core of 3 x 3 x 128. [0074] The subsequent layers 28, 29 and 30 are other convolutional layers. Layer 28 corresponds to an assembly having the dimension, 5 x 13 x 256, applies a filter kernel of 3 x 3 x 256. Layer 29 has the same dimension, but a filter kernel of 1 x 1 is applied. The last layer before the decision layer 31 is the convolutional layer 30. The convolutional layer 30 is a deep filtering layer corresponding to a set with a depth of 1024. A filter kernel with dimensions 1 x 1 x 1024 is applied. [0075] Each layer can serve as an activation map including activation values for the neurons associated with the layers of the neural network. The activation values emerging from these activation maps are for example entered into an activation function, such as a rectangular or sigmoid function to form the activation potential encountered by a respective neuron. [0076] The next layer 31 is the object detection algorithm decision layer which ultimately defines the bounding boxes. The decision layer of such a neural network is also described together with FIG. 4 . [0077] The result of this decision layer is, in particular, limitation boxes with corresponding probability values indicating the probability that the limitation box indeed contains the predefined type information to be extracted, in this example the value of the total amount . [0078] The result of the decision layer 31, p. ex. the bounding box with the highest probability of containing the predefined type information is fed into the first detection layer of a character identification algorithm 32. [0079] The following convolutional layers, 33, 34, 36, 37, 39, 40, 41, the max pooling layers 35, 38 and the decision layer 42 are identical with respect to the dimensions of the filter sets and kernels and the sequence of layers. [0080] However, convolutional neural network layers are suitable for character identification from the previously identified limiting box. As described above, character identification also may involve the construction of bounding boxes, including a bounding box for each character. For each limiting box, a subset of the character set, corresponding to characters that are allowed to appear in a certain position, could be determined. Also, a probability value for a character to match the character currently present on the document is assigned. [0081] Therefore, each number of the total amount is, for example, identified and the total amount is returned to the user after the decision layer. The probability values of each character in the limit box could be multiplied with each other and, if the resulting probability is higher than a threshold value, the corresponding combination of characters is, for example, accepted as a match for the value of the total amount. [0082] All of the neural network layers described above could be implemented as layers of a fully convolutional neural network, such as YOLO v2. [0083] The decision layer(s) 31, 42 and their respective results are illustrated in FIG. 4 in more detail. The decision layer 31, 42 has a filter kernel with a height and a width of 1 x 1. The results of the example decision layers illustrated, 31, 42, include the exact position of the center of the limiting box D1, the width and the height of the limitation box D2, the aforementioned object presence score D3 and the aforementioned classification bodies D4. [0084] An exemplary method of extracting information of a predefined type with exemplary training performed in parallel is shown in FIG. 5. [0085] In a T1 activity, a first object detection algorithm is applied on the training sheet. The first object detection algorithm can be a custom algorithm to detect the segment(s) of interest and to obtain the bounding boxes surrounding those segments. [0086] During a T2 activity, the limiting boxes surrounding these interesting positions are obtained. In addition to these limitation boxes, during the T2 activity, the type of the detected information is also obtained (e.g., amount, date). [0087] During a T3 activity, a second object detection algorithm which is a character identification algorithm based on a fully convolutional neural network, is applied. [0088] During a T4 activity, the limiting boxes, with the character classifications and the corresponding probabilities for the classification to be correct, are obtained. [0089] During the T5 activity, all limiting boxes are collected from the result. [0090] During a T6 activity, the bounding boxes are ordered according to their horizontal position on the scanned document. [0091] During a T7 activity, a character subset is obtained for each limiting box. The character subset includes all characters that are allowed to appear in the particular position of the bounding box. The subset is determined based on syntax and/or format constraints. [0092] During a T8 activity, for each limiting box, the character with the highest probability is selected. [0093] During a T9 activity, in response to results indicating that probabilities are above a threshold, the character sequence is accepted as a match. [0094] During a T10 activity, the result is presented to the user for confirmation and manual corrections made by the user are received to improve the process, e.g. ex. by adapting the filter nuclei, by adapting the weights of certain neurons and so on. [0095] During a T11 activity, the process is repeated from the T1 activity. [0096] A mobile device 70 that could be configured to implement the method in part or in full is shown in FIG. 6 . Mobile device 70 could be any hand-held device with computational capabilities, such as a smartphone, tablet, etc. [0097] As mentioned above, the object and character recognition part of the process could be implemented in the background on an image that was taken with the mobile device's camera 70, or by a another such device. The entire method could also be implemented on the mobile device 70 itself with the retrieved values permanently stored on the mobile device 70. For example, instead of a mobile phone, as shown in Fig. 6, a manual scanning or photographing device, equipped with processors and storage capacity, adapted to the method, could be used to implement the method. [0098] For example, with mobile device 70, the user takes a picture of a receipt, technical specification, or other such data and determines the type of information they wish to extract, e.g. ex. The total amount. Then the user can activate the object/character recognition process and use the extracted information, for example, to automatically fill in a form such as an expense reimbursement form, a tax return or other form of this kind. [0099] An exemplary computing device for implementing the method, or at least parts of the method is illustrated in FIG. 7 . [0100] Computer system 100 is arranged to execute a set of instructions on processor 102 to cause computer system 100 to perform tasks such as those described herein. [0101] Computer system 100 includes processor 102, main memory 104, and network interface 108. Main memory 104 includes user space that is associated with user-executed applications and kernel space that is reserved for a system. operating systems as well as associated hardware and operating system applications. Computer system 100 further includes static memory 106, e.g. ex. non-removable flash memory and/or solid state disk and/or removable micro or mini SD card that permanently stores software that enables computer system 100 to perform computer system 100 functions. include a video display 110, an interface control module 114 and/or a cursor and alphanumeric input device 112. Optionally, additional I/O interfaces 116 such as a card reader and USB interfaces can be present. The computer system components 102 through 109 are interconnected by a data bus 118.
权利要求:
Claims (14) [0001] A method is provided for automatically extracting information of a predefined type from a document, the method comprising the use of a detection algorithm to identify at least one segment of the document which probably includes the predefined type information, the construction of at least one limitation box corresponding to said at least one segment, in response to identifying a limit box presumably comprising the predefined type information, extracting the information comprised by the limit box from said at least one limit box. [0002] The method of claim 1, wherein a character identification algorithm is used to extract predefined type information in said at least one limit box. [0003] The method of claim 2, wherein the character identification algorithm used to extract the predefined type information is configured to use characteristics of said predefined type information to recognize the predefined type information. [0004] The method of claim 3, wherein the used characteristics of said information of a predefined type include at least one of a comma or decimal point and a number format. [0005] The method of claim 4, wherein each different layer of a multi-layered neural network serves to identify different features of the document. [0006] The method of claim 5, wherein a first layer of the neural network is assigned to detect rudimentary shapes on the document and wherein a second layer of the neural network is assigned to abstract object detection. [0007] The method of any of claims 5 or 6, wherein the neural network is a neural network compatible with a decision layer, the decision layer being a neural network layer which serves to detect at least one of (i) a placement of a bounding box (ii) the height and width of a bounding box, and (iii) a classification score indicating a classification of a detected character. [0008] The method of any one of claims 1 to 7, wherein a convolutional neural network, in particular a fully convolutional neural network, is used by the object detection algorithm and/or the character identification algorithm . [0009] The method of any of claims 5 to 8, wherein the method includes a training activity to train the neural network with a plurality of documents to correctly extract the predefined type information. [0010] The method of any of claims 1 to 9, wherein a probability value is assigned to a limit box, the probability value being indicative of the probability that a certain limit box contains the predefined type information . [0011] The method of any of claims 2 to 10, wherein a probability value is assigned to a character identified by a character identification algorithm, the probability value being indicative of the probability that the identified character is identical to a character actually understood by the predefined type information. [0012] The method of claim 11, wherein a probability value assigned to a bounding box and probability values assigned to characters in that bounding box are used to provide a combined confidence score. [0013] A computer system, the computer system comprising at least one computer arranged to execute the method according to any one of claims 1 to 12. [0014] A computer program product comprising program code instructions stored on a computer readable medium for carrying out the method steps according to any one of claims 1 to 12 when said program runs on a computer.
类似技术:
公开号 | 公开日 | 专利标题 FR3098328A1|2021-01-08|Process for automatically extracting information of a predefined type from a document Mittal et al.2016|Spotgarbage: smartphone app to detect garbage using deep learning US9984471B2|2018-05-29|Label and field identification without optical character recognition | US20190130232A1|2019-05-02|Font identification from imagery CA2900818C|2020-03-31|Systems and methods for tax data capture and use CN104299006A|2015-01-21|Vehicle license plate recognition method based on deep neural network BE1025504B1|2019-03-27|Pattern recognition system US20190385054A1|2019-12-19|Text field detection using neural networks Yang et al.2016|Recapture image forensics based on Laplacian convolutional neural networks Nakamura et al.2017|Scene text eraser EP1525553A2|2005-04-27|Method and system for automatically locating text areas in an image US11138423B2|2021-10-05|Region proposal networks for automated bounding box detection and text segmentation US20210012145A1|2021-01-14|System and method for multi-modal image classification Thakur et al.2019|Geometrical attack classification using DCNN and forgery localization using machine learning Pathak et al.2020|Mobile-Based Indian Currency Detection Model for the Visually Impaired Rafi et al.2020|L2-Constrained RemNet for Camera Model Identification and Image Manipulation Detection US20210034700A1|2021-02-04|Region proposal networks for automated bounding box detection and text segmentation US10896357B1|2021-01-19|Automatic key/value pair extraction from document images using deep learning Kang et al.2018|Inception network-based weather image classification with pre-filtering process Chowdhury et al.2021|Bangla number plate recognition from noisy video footage using deep learning Ayman et al.2018|An Automatic System for Water Meter Index Reading Sindhu et al.2019|Toll Plazza Management System Using Text Detection and Text Recognization Techniques Velasco Rodríguez2022|Study on the application of different image preprocessing algorithms in image segmentation using deep learning techniques Thakkar et al.2020|Spot a Spot—Efficient Parking System Using Single-Shot MultiBox Detector Manjula et al.2022|Implementation of Smart Parking Application Using IoT and Machine Learning Algorithms
同族专利:
公开号 | 公开日 EP3761224A1|2021-01-06| FR3098328B1|2022-02-04| US20210004584A1|2021-01-07|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题 US20190019020A1|2017-07-17|2019-01-17|Open Text Corporation|Systems and methods for image based content capture and extraction utilizing deep learning neural network and bounding box detection training techniques| US11144715B2|2018-11-29|2021-10-12|ProntoForms Inc.|Efficient data entry system for electronic forms| US11087163B2|2019-11-01|2021-08-10|Vannevar Labs, Inc.|Neural network-based optical character recognition| US11210562B2|2019-11-19|2021-12-28|Salesforce.Com, Inc.|Machine learning based models for object recognition|
法律状态:
2020-07-22| PLFP| Fee payment|Year of fee payment: 2 | 2021-01-08| PLSC| Publication of the preliminary search report|Effective date: 20210108 | 2021-07-28| PLFP| Fee payment|Year of fee payment: 3 |
优先权:
[返回顶部]
申请号 | 申请日 | 专利标题 FR1907252|2019-07-01| FR1907252A|FR3098328B1|2019-07-01|2019-07-01|Method for automatically extracting information of a predefined type from a document|FR1907252A| FR3098328B1|2019-07-01|2019-07-01|Method for automatically extracting information of a predefined type from a document| EP20178232.3A| EP3761224A1|2019-07-01|2020-06-04|Method of automatically extracting information of a predefined type from a document| US16/907,935| US20210004584A1|2019-07-01|2020-06-22|Method of automatically extracting information of a predefined type from a document| 相关专利
Sulfonates, polymers, resist compositions and patterning process
Washing machine
Washing machine
Device for fixture finishing and tension adjusting of membrane
Structure for Equipping Band in a Plane Cathode Ray Tube
Process for preparation of 7 alpha-carboxyl 9, 11-epoxy steroids and intermediates useful therein an
国家/地区
|